
Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 


1. Thesis and Dissertation Collection, all items 


2020-03 

USING DEEP CONVOLUTIONAL NEURAL 
NETWORKS TO CLASSIFY LITTORAL AREAS 
WITH 3-BAND AND 5-BAND IMAGERY 

Mielke, Ashley M. 

Monterey, CA; Naval Postgraduate School 
http://hdl.handle.net/10945/64930 
Downloaded from NPS Archive: Calhoun 



DUDLEY 

KNOX 

LIBRARY 


Calhoun is a project of the Dudley Knox Library at NPS, furthering the precepts and 
goals of open government and government transparency. All information contained 
herein has been approved for release by the NPS Public Affairs Officer. 

Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 


http ://w w w. nps.edu/ltbrary 



NAVAL 

POSTGRADUATE 

SCHOOL 


MONTEREY, CALIFORNIA 


THESIS 


USING DEEP CONVOLUTIONAL NEURAL NETWORKS 
TO CLASSIFY LITTORAL AREAS WITH 3-BAND AND 
5-BAND IMAGERY 

by 

Ashley M. Mielke 
March 2020 

Thesis Advisor: Mara S. Orescanin 

Second Reader: Jeremy P. Metcalf 


Approved for public release. Distribution is unlimited. 




THIS PAGE INTENTIONALLY LEFT BLANK 



REPORT DOCUMENTATION PAGE 


Form Approved OMB 
No. 0704-0188 


Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing 
instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of 
information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions 
for reducing this burden, to Washington headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson 
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project 
(0704-0188) Washington, DC 20503. 


1. AGENCY USE ONLY 

2. REPORT DATE 

3. REPORT TYPE AND DATES COVERED 

(Leave blank) 

March 2020 

Master’s thesis 


4. TITLE AND SUBTITLE 

USING DEEP CONVOLUTIONAL NEURAL NETWORKS TO CLASSIFY 
LITTORAL AREAS WITH 3-BAND AND 5-BAND IMAGERY 

6. AUTHOR(S) Ashley M. Mielke ” 


11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the 
official policy or position of the Department of Defense or the U.S. Government. 


13. ABSTRACT (maximum 200 words) 

With the assistance of high-resolution satellites, unmanned aerial vehicles, and fixed camera 
observation points, coastal change detection and landscape classification are active research areas that have 
the capability to provide situational awareness. However, classification of bottom types in littoral waters is 
an area of coastal landscape classification that has not been studied extensively, and accurate and timely 
classification of bottom types remains elusive. Furthermore, it is unclear whether 5-band imagery (RGB, or 
red, green, blue; along with near infrared and RedEdge) will help deep convolutional neural networks 
(DCNN) classify bottom types easier than just color (RGB). In this study, a DJI Inspire unmanned aerial 
vehicle equipped with a MicaSense RedEdge sensor was used to obtain 5-band imagery of several coastal 
areas. These images were classified by various means for six areas: swash zone, sandy bottom, bottom other 
than sand, sand, kelp and above ground rock. This database was then used to train the DCNN for 
classification on unseen imagery. The models were first initialized with RGB data and then compared to the 
5-band outputs. DCNNs were able to classify littoral areas with more accuracy using 5-band imagery than 
3-band imagery. Further studies can apply the methods developed in this research and compare 5-band 
imagery obtained from unmanned aerial systems with imagery obtained from high-resolution satellites such 
as WorldView 3. 


NSN 7540-01-280-5500 


14. SUBJECT TERMS 

machine learning, neural networks, bottom type, semantic segmentation, remote sensing, 
data processing, artificial intelligence, deep learning, unmanned aerial vehicles, Carmel 
River, unmanned systems, littoral zone, littorals 

18. SECURITY 
CLASSIFICATION OF THIS 
PAGE 

Unclassified 


19. SECURITY 
CLASSIFICATION OF 
ABSTRACT 

Unclassified 


17. SECURITY 
CLASSIFICATION OF 
REPORT 

Unclassified 


15. NUMBER OF 
PAGES 

_69_ 

16. PRICE CODE 

20. LIMITATION OF 
ABSTRACT 


12b. DISTRIBUTION CODE 

A 


12a. DISTRIBUTION / AVAILABILITY STATEMENT 

Approved for public release. Distribution is unlimited. 


7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 

Naval Postgraduate School 
Monterey, CA 93943-5000 

9. SPONSORING / MONITORING AGENCY NAME(S) AND 
ADDRESS(ES) 

N/A 


5. FUNDING NUMBERS 


8. PERFORMING 
ORGANIZATION REPORT 
NUMBER 


10. SPONSORING / 
MONITORING AGENCY 
REPORT NUMBER 


1 


Standard Form 298 (Rev. 2-89) 
Prescribed by ANSI Std. 239-18 




























THIS PAGE INTENTIONALLY LEFT BLANK 


11 



Approved for public release. Distribution is unlimited. 


USING DEEP CONVOLUTIONAL NEURAL NETWORKS TO CLASSIFY 
LITTORAL AREAS WITH 3-BAND AND 5-BAND IMAGERY 


Ashley M. Mielke 

Lieutenant Commander, United States Navy 
BS, University of North Carolina at Charlotte, 2007 


Submitted in partial fulfillment of the 
requirements for the degree of 


MASTER OF SCIENCE IN METEOROLOGY AND PHYSICAL 

OCEANOGRAPHY 

from the 

NAVAL POSTGRADUATE SCHOOL 
March 2020 


Approved by: Mara S. Orescanin 
Advisor 


Jeremy P. Metcalf 
Second Reader 


Peter C. Chu 

Chair, Department of Oceanography 



THIS PAGE INTENTIONALLY LEFT BLANK 


IV 



ABSTRACT 


With the assistance of high-resolution satellites, unmanned aerial vehicles, and 
fixed camera observation points, coastal change detection and landscape classification are 
active research areas that have the capability to provide situational awareness. However, 
classification of bottom types in littoral waters is an area of coastal landscape 
classification that has not been studied extensively, and accurate and timely classification 
of bottom types remains elusive. Furthermore, it is unclear whether 5-band imagery 
(RGB, or red, green, blue; along with near infrared and RedEdge) will help deep 
convolutional neural networks (DCNN) classify bottom types easier than just color 
(RGB). In this study, a DJI Inspire unmanned aerial vehicle equipped with a MicaSense 
RedEdge sensor was used to obtain 5-band imagery of several coastal areas. These 
images were classified by various means for six areas: swash zone, sandy bottom, bottom 
other than sand, sand, kelp and above ground rock. This database was then used to train 
the DCNN for classification on unseen imagery. The models were first initialized with 
RGB data and then compared to the 5-band outputs. DCNNs were able to classify littoral 
areas with more accuracy using 5-band imagery than 3-band imagery. Further studies can 
apply the methods developed in this research and compare 5-band imagery obtained from 
unmanned aerial systems with imagery obtained from high-resolution satellites such as 
WorldView 3. 


v 



THIS PAGE INTENTIONALLY LEFT BLANK 


vi 



TABLE OF CONTENTS 


I. MOTIVATION.1 

II. INTRODUCTION.3 

III. METHODOLOGY.13 

A. IMAGE COLLECTION.13 

B. IMAGE PROCESSING.15 

C. IMAGE LABELING.16 

D. DEEP LEARNING TRAINING.21 

IV. RESULTS.31 

A. SAND.32 

B. BOTTOM OTHER THAN SAND (BOTS).33 

C. SANDY BOTTOM (SB).34 

D. SWASH ZONE (SZ).35 

E. KELP.35 

F. ABOVE GROUND ROCK (AGR).36 

G. CLASS ACCURACY COMPARISON.37 

V. DISCUSSION.41 

VI. CONCLUSION.47 

LIST OF REFERENCES.49 

INITIAL DISTRIBUTION LIST.53 


vii 






















THIS PAGE INTENTIONALLY LEFT BLANK 



LIST OF FIGURES 


Figure 1. Diversity of different ocean bottom types. Source: Armenteros and 

Saladrigas (2018).2 

Figure 2. Example of a CNN with 5-layers. Source: O’Shea and Nash (2015).5 

Figure 3. U-Net architecture developed by Ronneberger et al. Source: 

Ronneberger et al. (2015).9 

Figure 4. ENVINet 5 architecture. Source: Harris Geospatial Solutions (2019).10 

Figure 5. Carmel River State Beach, the site of image acquisition. Source: 

Scooler (2017).13 

Figure 6. Micasense RedEdge-M attached to DJI Inspire 1 via 3-D printed 

resin cradle.14 

Figure 7. Depiction of the 75% overlap, used to minimize artifacts in images. 

Source: MicaSense Inc. (2019).15 

Figure 8. MicaSense RedEdge-M multi-spectral camera. Source: MicaSense 

Inc. (2019).16 

Figure 9. n-D visualizer tool. Source: Harris Geospatial Solutions.19 

Figure 10. Visual depiction of the scatter plot tool. Source: Harris Geospatial 

Solutions.20 

Figure 11. ENVI Deep Learning workflow Source: Harris Geospatial Solutions, 

Inc. (2019).22 

Figure 12. Labels for each of the six classes.23 

Figure 13. Hand-labeled rocky image.25 

Figure 14. Hand-labeled beach image.26 

Figure 15. Rocky image with 3-band (bottom left) and 5-band (bottom right) 

MLC outputs.27 

Figure 16. Beach image with 3-band (bottom left) and 5-band (bottom right) 

MLC outputs.28 

Figure 17. Ground truth sand image on left with 3-band versus 5-band 

classification.33 




















Figure 18. Ground truth BOTS image on left with 3-band versus 5-band 

classification.34 

Figure 19. Ground truth SB image on left with 3-band versus 5-band 

classification.34 

Figure 20. Ground truth SZ image on left with 3-band versus 5-band 

classification.35 

Figure 21. Ground truth kelp image on left (false color) with 3-band versus 5- 

band classification.36 

Figure 22. Ground truth AGR image on left with 3-band versus 5-band 

classification.37 

Figure 23. Accuracy comparison of kelp in the 5-band (top) and 3-band 

(bottom) models.38 

Figure 24. Accuracy comparison of SB in the 5-band (top) and 3-band (bottom) 

models.39 

Figure 25. Original image of AGR.42 

Figure 26. AGR false color image with NIR as red, Red Edge as green and red 

as blue.43 


x 












LIST OF TABLES 


Table 1. Rule classifier thresholds.29 

Table 2. Model information for each of the six classes.32 

Table 3. Accuracies for the rocky image, as seen in Figure 15.37 

Table 4. Accuracies for the sandy image.39 







THIS PAGE INTENTIONALLY LEFT BLANK 



LIST OF ACRONYMS AND ABBREVIATIONS 


AGR 

Above ground rock 

BOTS 

Bottom other than sand 

CNN 

Convolutional Neural Network 

CRSB 

Carmel River State Beach 

DJI 

Da Jiang Innovations 

DCNN 

Deep Convolutional Neural Network 

GPU 

Graphics Processing Unit 

MLC 

Maximum Likelihood Classification 

NO A A 

National Oceanic and Atmospheric Administration 

ReLU 

Rectified Linear Unit 

SB 

Sandy Bottom 

SZ 

Swash zone 

UASs 

Unmanned Aerial Systems 


xiii 



THIS PAGE INTENTIONALLY LEFT BLANK 


xiv 



I. 


MOTIVATION 


The environment plays a crucial role in naval operations. Decision makers 
must have a clear understanding of environmental conditions when deploying forces 
abroad, potentially into harm’s way. The littoral environment plays an especially 
important role in this decision-making process as the transition between open water 
operations and land-based operations, with amphibious landings providing the 
critical link between the two. The 2015 revision of A Cooperative Strategy for 21st 
Century Seapower emphasized critical capabilities that must be possessed by current 
and future forces in order to retain and improve America’s warfighting advantage in 
the littorals (Mabus 2015). 

The first of these capabilities is all domain access. Due to the rise of denied 
environment challenges, capabilities that enable our forces and allies to gain and 
maintain access must be prioritized (Mabus 2015). A second capability, to be able to 
protect austere expeditionary bases in denied environments, directly ties into having 
an all-encompassing knowledge of the littorals. This second capability (protecting 
austere bases) falls under the sea control and power projection category, where the 
capability of amphibious forces to seize, establish, sustain, and secure must be 
improved (Mabus 2015). Both capabilities require a deep understanding of the littoral 
environment that our forces will be charged with controlling. 

Within the littoral environment, knowledge of the sea floor and obstacles 
within the water column is fundamental to the planning of operations. The 
composition of the sea floor, whether shaped by sand, mud, coral reefs, seagrass, or 
rock (see Figure 1), plays a large role in the planning process of amphibious landings, 
as does the presence of vegetation and bioluminescent organisms (for night 
operations). The type of sediment is also important in mine warfare and the burial of 
mines (Holland et al. 2002). 


1 



Gaining a deeper understanding of this environment is critical to the success of 
future operations. The intent of this research is to provide the warfighter a tool to 
categorize the bottom type of littorals quickly and decisively in any area around the 
world. 



Figure 1. Diversity of different ocean bottom types. Source: 

Armenteros and Saladrigas (2018). 


2 



II. INTRODUCTION 


The ability to classify bottom types correctly in littoral regions is important 
for many reasons. Ecologists use this information for a deeper understanding of the 
distribution of fauna and flora in a given area (De Juan et al. 2013), and fishing and 
dredging industries use this information to optimize performance (Bostater and 
Rotkiske 2018). Classification of bottom types is important to both commercial and 
government interests. The Coast Guard needs to understand bottom types for safe 
navigation, NOAA uses bottom type information in charts used by the Navy, and 
military leaders need accurate bottom type information to conduct amphibious 
operations (craft to shore). Although many industries rely on accurate bottom type 
information, classifying bottom types in littoral areas is not easy. Visible light cannot 
penetrate to the depths of the ocean bottom unless the water is clear, which, due to 
waves and sediment, is rarely the case. Also, having enough (and the right type of) 
assets to reconnoiter the areas that require monitoring can be challenging. 

Currently, there are a few different ways that coastal areas are being 
monitored. Fixed camera observation points are prevalent in many areas and provide 
a near constant monitoring capability; however, their ability to see to the bottom of 
the water column are limited, although buoys are being outfitted with cameras that 
can observe ocean states of fog, cloud cover, surface currents, and sea state (Kohler 
et al. 2016). Satellites are another remote sensing capability in the littoral area. One 
such satellite, the Landsat 8, launched by the USGS in 2013, is capable of both 
multispectral (wider bands covering large swaths of the electromagnetic spectrum 
(EM) and hyperspectral (much narrower bands covering smaller slices of the EM) 
imagery (Madonsela et al. 2017). 

Finally, unmanned vehicles, including unmanned aerial vehicles (UAVs), 
unmanned surface vehicles, and unmanned underwater vehicles, are being used more 
and more to research and catalog littoral regions. Small UAVs (like the DJI Inspire 1 
used in this study) are becoming more and more prevalent and are being equipped 

with advanced sensors that can gather imagery on an unprecedented level, including 

3 



scope and resolution. Imagery obtained in this study achieved a special resolution of 
5.2 cm per pixel. 

There are many advantages of using UAVs for aerial imagery compared to 
fixed camera observation points and satellite imagery, although recent advances in 
panchromatic high-resolution imagery from satellites such as WorldView 3 are 
beginning to bridge the gap. The resolution that UAVs can provide have already been 
noted, but in addition, UAVs can easily access areas that are hard to reach or areas 
that can be dangerous to humans. Also, UAVs provide timely imagery and most of 
them can be set up and launched for a mission within thirty minutes. A researcher 
may have to wait days, or even weeks, to obtain high-resolution satellite imagery for 
a given area, depending on funding and the priority given to his/her research. 
However, the WorldView 3’s hypertemporal resolution of one day can rival that of 
UAVs (Collin et al. 2019). Also, weather in some areas, such as clouds and fog, are 
persistent features, which may force the successive satellite passes to go by before a 
usable image can be obtained. Previously, researchers had to conform to the remote 
sensors’ spatial and temporal resolution. Now, with UAVs, the remote sensing 
package can be tailored to the researchers’ needs (Hugenholtz et al. 2013). 

Due to their ease of use and relatively low cost, UAVs are being used for 
many different types of research and observation, including natural resource 
assessment, environmental monitoring, forest inventories, surveying, river corridor 
monitoring, plume tracking, wildlife management, avalanche patrols, precision 
agriculture, law enforcement, firefighting, border patrol, disaster relief, and even 
search and rescue (Flynn and Chapra 2014). 

Recently, the questions have not been which assets are the right type or how 

many assets are appropriate for adequate surveillance of a given area, the question 

has become what is the best and most efficient way to process, evaluate, and 

disseminate the data. With the exponential increase in the quality of the cameras 

onboard unmanned vehicles coupled with the increase in the file size of the imagery, 

it has become increasingly important to have the right infrastructure in place to 

quickly import the recorded data and process it efficiently and accurately. For many 

4 



in the remote sensing industry, machine learning, and more specifically, deep 
learning, has become an increasingly popular research topic and more and more is 
being integrated into algorithm chains to process the data. 

Artificial Neural Networks (ANNs) are a category of deep learning that are 
based upon biological nervous systems. They are made up of an input layer, hidden 
layers, and an output layer. The hidden layers take the inputs from previous layers 
and determine whether a random change will enhance or degrade the chances of the 
input to get closer to the desired output state (O’Shea and Nash 2015). Deep learning 
refers to a network having multiple hidden layers each applying these random 
changes, stacked together. Convolutional Neural Networks are very similar to an 
ANN (Figure 2), but they are better suited for the extraction and classification of 
features in imagery. 

convolution 

w.ReLu pooling fuJly-connccted 


rS rS 



fuMy-coiiticcrod 
vv' ReLu 


Figure 2. Example of a CNN with 5-layers. Source: O’Shea and Nash 

(2015). 

There are four basic elements to a CNN, an input layer where the pixels of 
the image are held, a convolution layer where weights of the node are achieved by 
multiplying the input by a certain amount (i.e. a scalar), a pooling layer that 
downsamples in the spatial dimension which reduces the amount of parameters, and 


5 
























a fully-connected layer that take the activations provided by the pooling layer and try 
to assign class scores to their inputs (O’Shea and Nash 2015). Most ANNs have 
hidden layers that are fully connected, meaning every output of layer n is connected 
to a node in layer n+1. This is computationally very expensive, which is one of the 
reasons why ANNs handle feature extraction in images poorly. CNNs are not fully 
connected in all hidden layers, which saves computational power, and allows CNNs 
to better handle larger files such as images better than ANNs, which are normally 
used on datasets such as MNIST (O’Shea and Nash 2015). 

In 2018, Buscombe and Ritchie reported that another deep learning algorithm 
and subset of CNNs, called deep convolutional neural networks (DCNNs), were 
being used extensively in the area of image recognition. As with any new technology, 
DCNNs have their advantages and drawbacks. Some of the advantages are that the 
images do not need to be modified like they need to be for other machine learning 
algorithms and that generally, DCNNs perform better with more and more image 
ingestion, versus machine learning algorithms stay the same after a certain point 
(Buscombe and Ritchie 2018). The disadvantages to DCNN use is that they require 
extensive computational resources and require a trained expert to be able to run and 
troubleshoot them (Buscombe and Ritchie 2018). 

Before DCNNs, there have been a multitude of attempts and many different 
algorithms were developed with the goal of being able to quickly and accurately 
classify remote sensing data. In 2004, Qui and Jensen developed a new algorithm, 
dubbed a neuro-fuzzy system, which combined fuzzy logic, which is composed of if- 
then statements, and neural networks (Qiu and Jensen 2004). Then, in 2015, Morgan 
et al. tested three ML algorithms against one another, Maximum Likelihood 
Classification (MLC) (technically not a ML algorithm, it’s more of a statistical metric 
but can still be compared), Support Vector Machine (SVM), and an Artificial Neural 
Network (ANN) to determine which algorithm would classify seven land use/land 
cover classes (sandy wetlands, sandy bare soils, field crops, water, aquatic vegetation, 
fish ponds and settlements). Using ENVI for pre- and post-processing on the images, 
they found that MLC was more accurate than SVM and ANN (Morgan et al. 2015). 

6 



There have also been other efforts to characterize landscape changes using 
machine learning. In 2015, Hoonhout et al. demonstrated the effectiveness of a 
model, based upon the Smooth Support Vector Machine (SSVM) algorithm, to 
classify five different coastal classes, object, sand, sky, vegetation and water. After 
training, the model was able to correctly identify 93% of the pixels of the images sent 
to it for testing (Hoonhout et al. 2015). More recently in 2018, Hermann demonstrated 
the use of remote sensing imagery along with a DCNN to classify eight different 
coastal landscapes with an accuracy of 95% (Hermann 2018). 

Previous efforts to classify bottom types through remote sensing have used a 
variety of sensors to study this topic. In 2013, Tulldahl et al. demonstrated that the 
combination of LiDAR (1064 nm (Near Infrared) and 532 nm (green) wavelength 
lasers) and Worldview-2 (WV-2) satellite data (using all 8 of the satellite’s 
multispectral bands: Coastal, Blue, Green, Yellow, Red, RedEdge, NIR1 andNIR2) 
were able to better classify three general bottom type classes (Hard, Soft, and 
SoftHiVeg) than with just LIDAR or WV-2 satellite data alone. In the study by 
Tulldahl et al., a machine-learning algorithm, random forest, was used to classify the 
data. The random forest algorithm uses “a large number of individual decision trees 
that operate as an ensemble. Each individual tree in the random forest spits out a class 
prediction and the class with the most votes becomes [the] model’s prediction” (Yiu 
2019). The combination of LiDAR and WV-2 data, with the random forest algorithm 
used for classification, achieved an accuracy of 76 (Kappa = 71), versus just 59 
(Kappa = 50) for LiDAR alone and just 54 (Kappa = 45) for WV-2 imagery alone 
(Tulldahl et al. 2013). 

In a study by Salamati et al. in 2012, a digital single lens reflex (DSLR) 
camera was used to capture 370 color (RGB) images and NIR images (with a NIR 
filter) of natural landscapes. These images were manually segmented and annotated 
at the pixel level to determine whether they belonged to one of ten classes (Building, 
Cloud, Grass, Road, Rock, Sky, Snow, Soil, Tree, and Water). A Conditional 
Random field was used to train a computer to classify the images, which were 
separated into two datasets, one dataset had RGB images only and the other dataset 

7 



had RGB plus NIR imagery (Salamati et. al 2012). The RGB plus NIR datasets 
improved the classification accuracy of 7 out of the 10 classes. 

The study that is presented in this research uses a different framework than 
the previously discussed studies. One of the aims of this research is to study whether 
multispectral Red Edge and NIR imagery can improve the determination of bottom 
type. Images were organically collected via a quadcopter UAV, then processed in 
ENVI, an imagery analysis software developed by Harris Geospatial Solutions, Inc. 
Images were registered and labeled, then sent to the ENVI deep learning module to 
train a neural network to be able to recognize the specified classes on its own. 

The architecture of the neural network used in the module, ENVINet 5, “is 
based on the U-Net architecture developed by Ronneberger, Fischer, and Brox 
(2015),” (Harris 2019). The U-Net architecture (Figure 3) that ENVINet 5 is based 
upon is a modified fully CNN that is designed to need less training images and still 
give “precise segmentations” (Ronneberger et al. 2015). The U-Net architecture has 
two sides, a contracting path and an expansive path. 


8 



64 64 


input 

image 

tile 


128 64 64 2 



output 

segmentation 

map 


► conv 3x3, ReLU 
copy and crop 

| max pool 2x2 
4 up-conv 2x2 

► conv lxl 


Figure 3. U-Net architecture developed by Ronneberger et al. Source: 

Ronneberger et al. (2015). 


On the contracting path, 3x3 convolutions are applied “followed by a rectified 
linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling”, 
(Ronneberger et al. 2015). On the expansive path, the feature map is upsampled, 
“followed by a 2x2 convolution (‘up-convolution’) that halves the number of feature 
channels, a concatenation with the correspondingly cropped feature map from the 
contracting path, and two 3x3 convolutions, each followed by a ReLU” (Ronneberger 
et al. 2015). This U-Net architecture described by Ronneberger, Fischer, and Brox 
has 23 layers, and ENVINet 5 (Figure 4) also has 23 layers, with a TensorFlow model 
added on top of the network. 

TensorFlow is an open-source library developed by Google that looks for 
specific features using label rasters that show “known samples of the feature” (Harris, 
2019). Once the TensorFlow model takes these rasters and is trained on specific 
features, it can classify features in other images (Harris 2019). TensorFlow uses 
computational graphs to represent the algorithms and is being lauded for “its ability 


9 















to perform fast automatic gradient computation, its inherent support for distributed 
computation and specialized hardware as well as its powerful visualization tools” 
(Goldsborough 2016). 



^ Input patch 

3x3 convolution 


2x2 downsampling 

Feature map 

2 x 2 upsampling 

Merge 

lxl convolution 


Figure 4. ENVINet 5 architecture. Source: Harris Geospatial Solutions 

(2019). 


The hypothesis of this research is two-fold. The first hypothesis is that a 
neural network, more specifically a convolutional neural network, ENVINet 5, can 
be trained to accurately classify bottom types using UAS imagery. This will be done 
by compiling a littoral image database that will expand upon an existing database that 
has been constructed for coastal zones (tidal flat, beach, marsh, etc.) used to detect 
coastal changes. The second hypothesis is to determine whether the multispectral Red 
Edge and NIR EM bands improve the neural network’s ability to classify bottom 


10 










































































































types versus RGB imagery alone. It is proposed that over the water the Red Edge and 
NIR bands will be able to classify vegetation much better due to the high reflectance 
values healthy vegetation have in the IR wavelengths. Since most rocky formations 
that have been in a habitat for a period have some sort of vegetation growing on them, 
it is proposed that this method will also accurately identify rocky bottoms as well. It 
is imperative that there be a quick-look system in place to quickly assess the current 
state of the littoral area and catalog the changes that occur over time. This study aims 
to be a step in that direction. 


11 



THIS PAGE INTENTIONALLY LEFT BLANK 


12 



III. METHODOLOGY 


A. IMAGE COLLECTION 

A database of imagery with different bottom types was needed to develop a 
neural network for classification. Carmel River State Beach (CRSB) was selected as 
the site to conduct the collection of images, as there are a variety of different bottom 
types in the area. CRSB is located south of Monterey Bay, and is in the southern 
portion of Carmel Bay (Figure 5). As can be seen from the figure, there are natural 
rock formations on the northern point near Carmel-by-the-Sea and on the southern 
portion of Carmel Bay adjacent to the area where the Carmel River meets the beach. 
Kelp is generally always present in the area, as it is a staple of the littoral zone of the 
west coast of the United States. As noted previously, the presence of kelp indicates 
full water colu mn obstacles and that there are rocks beneath, as kelp overwhelmingly 
adheres to rocky substrates the best, although there are cases of kelp growing from 
and adhering to sand. 



Carmel By the Sea 


Monterey 

Bay 


Carmel River 


Carmel Bay 


Carmel Bay 


Oct 2016 


Figure 5. Carmel River State Beach, the site of image acquisition. 

Source: Scooler (2017). 


13 



Over the course of two days, over 1200 images were captured of the area just 
offshore of CRSB. The Da-Jiang Innovations (DJI) Inspire 1 UAV was the platform 
used to acquire the images, and it was outfitted with the MicaSense RedEdge-M (RE- 
M), a professional multispectral camera (produced by MicaSense, Inc.) that was 
designed for the agriculture industry to be able to conduct precise agricultural 
mapping. 

The RE-M camera was secured to the Inspire 1 via a 3-D printed resin cradle 
that attached to the battery compartment of the Inspire 1 (Figure 6). A 3-D printed 
resin cradle was used to mount the RE-M and instead of using power from the Inspire 
1, a 5 V battery pack was connected to the RE-M. 



Figure 6. Micasense RedEdge-M attached to DJI Inspire 1 via 3-D 

printed resin cradle 

The RE-M captures images in the blue (475 nm center, 20 nm bandwidth), 
green (560 nm center, 20 nm bandwidth), red (668 nm center, 10 nm bandwidth), red 
edge (717 nm center, 10 nm bandwidth) and NIR (840 nm center, 40 nm bandwidth) 
spectral bands. 


14 



Fly beyond •field boundaries to 
maximize coverage 


Fnontlap 



< 75 % 



Figure 7. Depiction of the 75% overlap, used to minimize artifacts in 
images. Source: MicaSense Inc. (2019). 


Flights with the RE-M were flown with the camera at near-nadir and with a 
75% forward overlap (Figure 7). Conducting flights with the 75% overlap is 
important to be able to cut down on artifacts appearing in the collected imagery and 
aids in image-to-image alignment, keeping stitching errors low in the post-processed 
images. Lastly, the RE-M comes equipped with a downwelling light sensor (DLS). 
The DLS aids in correcting for lighting changes midflight by measuring the 
downwelling sunlight for each band and storing this information in the metadata. 

B. IMAGE PROCESSING 

The raw imagery from the RE-M is not registered, meaning that for every 
image that is taken by the camera, there are 5 separate images of each shot, one image 
for each band (Green, Blue, Red, NIR, RE). Due to the RE-M having 5 separate focal 
planes (Figure 8), that each capture one of the five bands, in different locations on 

15 










































































the camera, each of the five images (of the one image) are slightly off spatially. On 
the computer, these images each have different file identifiers. For example, image 1 
would consist of 5 files, img001_l.tif, img001_2.tif, img001_3.tif, img001_4.tif, and 
img001_5.tif, each signifying one of the 5 bands. In order to properly classify the 
images, it is necessary to combine these images into one, multi-channel .tif image. 



Figure 8. MicaSense RedEdge-M multi-spectral camera. Source: 

MicaSense Inc. (2019). 

Interactive Data Language (IDL), a scientific programming language also 
developed by Harris Geospatial Solutions, Inc., was used to register the images. IDL 
is the programming backbone of ENVI. It must be noted that there exists a large 
repository of information on how to process RE-M imagery (along with MicaSense 
Inc.’s other, newer cameras, the RedEdge-MX and Altum) with IDL and ENVI at 
https://github.com/envi-idl/UAVToolkit. Once the images were combined into multi¬ 
channel (5 band) .tif images, they were ready for labeling and classification. 

C. IMAGE LABELING 

For this study, the images were separated into six different classes, sandy 
bottom (SB), sand, above ground rock (AGR), kelp, swash zone (SZ), and bottom 
other than sand (BOTS). The BOTS classification was originally termed rocky 
bottom, but due to the ambiguity of the substrate in the images, it could not be 
determined that the objects seen on the bottom were rocks. In situ verification of the 
dark substrates composing the bottoms was not available. 


16 



In total, 356 images were used for this study (47 AGR, 117 kelp, 57 sand, 40 
SZ, 66 BOTS, 29 SB). When training neural networks, it is important to keep a set of 
images that can be used to test the model to validate the classification accuracy. These 
are images that the model has never seen before, so the result can indicate how well 
the model is weighted. In this study, 30% of the labels created were saved for 
validation. After identification as images that would be used for training and 
validation, these images were then separated into separate folders, one for training 
and one for testing. With the images cataloged and organized, classification of the 
images began. 

ENVI provides many ways to classify imagery, including supervised and 
unsupervised methods. Supervised classification methods use training data to train 
the algorithm. The training data (within ENVI) consists of regions of interest (ROIs) 
that group similar pixels that the user has identified. The supervised classification 
algorithms available are MLC, Minimum Distance, Mahalanobis Distance, Spectral 
Angle Mapper, and Support Vector Machine, including others. Unsupervised 
classification does not use training data to train the algorithm, and for this type, ENVI 
has 2 methods available, Iterative Self-Organizing Data Analysis Technique 
(ISODATA) and K-Means classification. 

These supervised and unsupervised classification methods are all viable 
options, but since the goal was to use deep learning and more specifically, a DCNN 
(due to their aforementioned innate ability to extract features from images), the 
TensorFlow model (that uses ENVINet 5) was chosen to be the algorithm to train the 
images on. In order to create the training data, images were classified with a variety 
of methods to identify the various features in each image and to build up a large 
enough library of each type of bottom type and feature that is of interest. 

Initially, only the bottom types of sandy bottom, bottom other than sand, and 

kelp were considered for classification, but further discussion led to the additional 

classifiers of sand, above ground rock and swash zones. Since this research is focused 

on the littoral ocean and the features of interest (kelp, SB, BOTS) are so close to the 

shore, classification of the sand, above ground rock and swash zone were included in 

17 



the study so models could be trained that for features of interest that would not create 
false positives for those shore-based features. A full spectrum classification of the 
littoral area including the beach was determined to be the most beneficial to get a 
complete picture of the area. 

There were three methods utilized to label the different classes in the images. 
The first method, creating a Region of Interest (ROI) and then sending the ROI to the 
n-Dimensional (n-D) visualizer tool, was used to label kelp (Figure 9). The n-D 
visualizer is a tool that takes all pixels contained in a ROI and displays them in a n- 
D plot where n represents the number of bands that are visualized. Since kelp reflects 
strongly in the NIR wavelengths, this band along with red and green were displayed. 
The pixels that signified kelp were concentrated along the infrared axis, which 
enabled kelp to be labeled in a manner consistent with its shape, which in turn helps 
the model recognize kelp. Hand labeling of kelp is not as accurate as using the n-D 
visualizer method as the small width of kelp blades are difficult to accurately label 
with hand labeling. 


18 




Band 3 is red, band 4 is RedEdge and band 5 is NIR. The green region, lying in the 
plane of 4 and 5, was selected because it is the area with many of the kelp pixels. 

Figure 9. n-D visualizer tool. Source: Harris Geospatial Solutions 


A second labeling method used for the classes of sand and swash zone was 
hand classification. Hand classification was used for these areas because they are 
largely homogenous and there are hardly ever other classes comingled with these two 
classes. Therefore, the ROI tool was used to draw polygons around the areas of 
interest, being careful not to include pixels of other classes that were not in the desired 
class. The n-D visualizer could have still been used to label these classes, but since 
they do not reflect well in the NIR or red edge wavelengths, only RGB, they are 
harder to distinguish with other classes that may be in the image. 

The third labeling method was using the grow function after creating a 
polygon ROI. The grow function grows the ROI from neighboring pixels using 
standard deviations as the threshold. Standard deviation values changed for each ROI, 
and in many of the labels, multiple ROIs were created in this fashion for different 
areas where the feature of interest was contained. Once an adequate number of 


19 

























features were represented, the separate ROIs were combined into one. This labeling 
method was used for BOTS, AGR, and SB. 

Another tool that was used in the labeling process was the 2-D scatter plot, 
which displays all pixels that are contained in the image on a 2-D plot, with 2 bands 
of the image as the x-axis and y-axis. In the figure below (Figure 10), the red band is 
plotted in the x-axis and the blue band is plotted in the y-axis. Different 
configurations were used to identify different areas, for example if trying to find kelp, 
choosing NIR as the x-axis or y-axis would easily distinguish the pixels that 
contained kelp as kelp has the highest reflectance values of all other classification 
categories (sand, AGR, SZ, SB, BOTS) in the NIR range. 



Band 3 is red and band 2 is green. The x and y axes are the intensities of each band. 

Figure 10. Visual depiction of the scatter plot tool. Source: Harris 

Geospatial Solutions. 


20 







In the 2-D scatter plot, as in the n-D visualizer, groups of pixels can be circled, 
and these pixels then belong to a new ROI, which is differentiated by band intensity. 
The corresponding pixels in the image are simultaneously given the same label as the 
pixels that were circled on the scatter plot (the new ROI), giving the user instant 
feedback on whether the circled pixels in the scatter plot need to be adjusted. For 
instance, if it is recognized that two different classes were labeled the same color in 
the image (i.e. sand and rocky bottom), this would indicate that the circle of pixels 
drawn in the scatter plot was too liberal and needs to be tightened. 

In these methods of labeling, the ROI is saved as an .xml file. For some 
images, there were two cases where multiple ROIs were created and saved. One case 
is where there are multiple classes within an image. Each class would be labeled with 
a ROI (if kelp the n-D visualizer is used) and saved in its respective folder. The other 
case is when there are multiple instances of a single class in an image separated by 
other features. For example, there were images of AGR that appeared in several 
locations within an image. Although every instance of AGR could have been saved 
as one ROI, the more ROIs that are created and saved as .xml files equates to more 
training rasters to feed the model. 

Classification continued in this manner until all images were labeled. Once 
labeled, each individual class, within each image, was first saved as a ROI in an .xml 
file. This would facilitate the next step, creating label rasters from these ROIs and 
training the neural network. 

D. DEEP LEARNING TRAINING 

In ENVI, the TensorFlow based model can be trained using either a label 
raster (the saved ROIs in .xml format) or a classification image (Figure 11), which 
can be obtained from the unsupervised or supervised classification methods 
mentioned above. Label rasters are built from ROIs and contain all bands from the 
image as well as a binary mask that labels areas where the feature is present with a 1 
and areas where the feature is not present with a 0. 


21 



Use existing binary 
classification image 


Build Label Raster From Classification 
API: ENVIBuildLabelRasterFromClassificationTask 


OR 


Collect training samples 
from images using ROIs 


Use the ENVI Feature Counting tool 
to identify features in images 


Build Label Raster From ROl 


API: ENVIFeatureCountToROITask 

API: ENVIBuildLabelRasterFromROITask 



initialize ENVlNet5 Model 
API: ENVIInitializeENVINet5ModelTask 

Build Deep Learning Raster 
API: ENVIBuildDeepLeamingRasterTask 


| Data preparation 
| Training 

I 1 Classification 

; ; Optional step 


1 j 

Train TensorFlow Mask Model 
API: ENVITrainTensorFlowMaskModelTask I 

T [ 

TensorFlow Mask Classification 
API: ENVITensorFlowMaskClassificationTask 

: Class Activation to Pixel ROl I 

; API: ENVICIassActivationToPixelROITask ; 

OR 

Class Activation to Polygon ROl 
API: ENVICIassActivationToPolygonROITask 
OR 

Class Activation to Classification 
A PI : E N VI Class ActivationTo Class ificationTask 


Optional : 

Edit ROIs and 
build a new 
label raster 


Figure 11. ENVI Deep Learning workflow Source: Harris Geospatial 

Solutions, Inc. (2019). 


To test the second hypothesis that 5-band imagery (RGB color plus Red Edge 
and NIR EM bands) would improve the neural network’s ability to classify the littoral 
area versus RGB imagery alone, it was necessary to create 3-band models using the 
same input images as the 5-band models. This way, a direct comparison between the 
3-band and 5-band models could be achieved. A sample of the labels created for each 
class is displayed in Figure 12. The labels in the figure are all 5-band labels; however, 
the 3-band labels are identical. Before the model can start to train on the label rasters, 
it must be initialized with the desired parameters. 


22 










































ears bbcis 






AO ft labels 


Sand lab* t 


Kelp iabrli 


Sfl-lnbrls 



The colors used in each image are for viewing ease and do not represent any specific 
variable. 


Figure 12. Labels for each of the six classes 


23 









The first parameter to define is the patch size. A patch is just a portion of the 
image that is sent to the model for training. The larger the patch size, the faster 
training can occur because the model can get through each image quicker. The default 
patch size is 572 pixels, and can be made larger, but is limited by the size of the 
memory on the users’ graphics card. 

The next parameter is to input the number of bands in the image. For this 
study, it was important to see how well the model could classify 5-band imagery 
compared to 3-band imagery, so essentially a pair of each model class was created. 
First, the 5-band models were compiled (with 5-band imagery) to identify the classes 
of sand, kelp, SZ, AGR, BOTS, and SB, then 3-band models were compiled (with 3- 
band imagery) to identify the same classes and compare. 

Also, it is important to note is that even with the same training rasters as inputs 
to the model, successive model runs will not yield the same results. This is due to the 
inherent randomness of deep learning and convolutional networks. Just as the human 
brain contains millions of neurons creating infinite combinations of paths to and from 
dendrites, neural networks are fashioned in the same manner, and no two model runs 
will be identical. However, if using the same input parameters, differences will be 
minor. ENVI has an additional feature that is helpful regarding these additional 
parameters that allows you to randomize them if you are unsure about the best values 
to input. 

The parameters discussed above are the only mandatory parameters that you 
must input, the rest are optional to input a value for, as ENVI will put default values 
in for them unless the user specifies otherwise. For this study, the default values were 
used for the number of epochs (25), patch sampling rate (16), number of patches per 
epoch (300), and number of patches per batch (9). Model runs took an average of 
seven minutes to complete and were ran on a Dell Precision 7910 with an Intel Xeon 
E5-2667 processor clocking 3.2GHz and with a NVIDIA Titan V GPU. 

There were a total of 12 models created, 6 for each of the classes (sand, SZ, 
AGR, SB, BOTS, and kelp) utilizing 5-band imagery and another 6 models for each 


24 



of the classes using 3-band (Blue, Green, Red) imagery. The output of the model is a 
class activation raster, which is a greyscale image in which the color of the pixels 
corresponds to the probability that the feature is present, with white being a positive 
match and areas where there is no match are black. 

The models were constantly refined and retrained. By taking the class 
activation raster (the output of the model) and comparing it to the input raster, areas 
correctly identified the feature were put into a new ROI, saved into a .xml, and sent 
into another model run so that the accuracy of the model increased with each iteration. 

In order to compare classes, a confusion matrix of each class was generated 
that compared how well the 3-band and 5-band models predicted each class in two 
separate images. Both images were fully hand labeled (Figures 13 and 14), but since 
every pixel in the image could not be hand labeled, the hand labeled image was ran 
through an MLC algorithm. 




In this image, pink represents sand, green represents SZ, yellow represents SB, royal blue 
represents BOTS, turquoise represents kelp, and red represents AGR. 

Figure 13. Hand-labeled rocky image 


25 




In this image, maroon represents sand, aqua represents SZ, red represents SB, yellow 
represents BOTS, and green represents kelp. Note that there is no AGR present in 
this image. 


Figure 14. Hand-labeled beach image 

In the rocky image (Figure 15), both the kelp (dark blue) and AGR (red) are 
better defined in the 5-band MLC output. In the beach image (Figure 16), the kelp 
(green) is better defined. Since the hand labels sent to each MLC algorithm (one for 
the 3-band image and one for the 5-band image) were the same, that is why the MLC 
outputs are similar. 


26 







Figure 15. Rocky image with 3-band (bottom left) and 5-band (bottom 

right) MLC outputs 


27 








Figure 16. Beach image with 3-band (bottom left) and 5-band (bottom 

right) MLC outputs 

In order to compare the accuracies for each class, the class within the MLC 
outputs were compared to the rule classified results of each class. The rule classifier 
was generated by combining all model outputs of each of the classes of an image. 
Therefore, the rule classifier for the rocky image had 6 layers (6 classes) whereas the 
rule classifier for the sandy image had 5 layers (5 classes). Within the rule classifier, 

28 








thresholds were specified at which to display each of the classes, where pixels with 
values above the threshold are labeled as within the class and pixels with values 
below the threshold are labeled as not in the class. Thresholds were obtained by 
observation of the color slice rasters (Figures 17-22) of the class activation image of 
each class. Table 1 shows the thresholds that were determined for each class. These 
were applied equally to the 3-band image as well as to the 5-band image in order to 
generate accuracies, which were the comparison of the ground truth MLC output with 
the rule classified image. Thresholds turned a class activation map into a binary class 
map, where each pixel was either in the class or outside. 


Table 1. Rule classifier thresholds 


Rule Classifier Thresholds 


sz 

0.7 

SB 

0.42 

Sand 

0.68 

Kelp 

0.61 

BOTS 

0.14 

AGR 

0.35 


Thresholds that were determined for each class using color 
slice of the class activation raster generated by the DCNN 
models after classifying each class within an image. 


After training, the models were then used to classify the images in the test 
dataset, which the models had never been trained/validated on. The results of these 
model runs are described in the next section. 


29 



THIS PAGE INTENTIONALLY LEFT BLANK 


30 



IV. RESULTS 


There were two hypotheses that were being tested in this study. The first 
hypothesis was whether a CNN (in this case ENVINet 5), can be trained to accurately 
classify the littoral area using UAS imagery. The second hypothesis was to determine 
whether the multispectral Red Edge and NIR spectral bands would improve the 
CNN’s ability to classify the littoral area versus using RGB alone. 

ENVINet 5 was able to classify littoral areas with varying degrees of 
accuracy. Table 2 shows the loss at exit and the number of training and validation 
images that each model was trained on. Loss at exit is how close the model reached 
the mean of the labels that it was being trained on. A lower loss meant that the model 
was able to get closer to the mean of the labels that were provided for training, and a 
higher mean conveys that the model could not get as close. To get the loss lower, 
more training images can be sent to the model to refine the result. 

Instead of ENVINet 5, the actual model file (,h5) trained for each class is used 
in each section. Each section also contains a figure with the original image on the 
left, the 3-band classification in the middle, and the 5-band classification on the right. 
Each classification image was adjusted to have 10 levels of accuracy, ranging from 
0-10% (purple) to 90-100% (red). The color bar to the right of the 5-band 
classification represents this accuracy. 


31 



Table 2. Model information for each of the six classes 


Class 

Model Type 

Loss at Exit 

#Training 

Validation 

BOTS 

5-Band 

0.2127 

10 

3 


3-Band 

0.1709 

10 

3 

AGR 

5-Band 

0.5435 

6 

2 


3-Band 

0.5172 

6 

2 

SAND 

5-Band 

0.0235 

6 

2 


3-Band 

0.2002 

6 

2 

KELP 

5-Band 

0.1959 

5 

3 


3-Band 

0.8499 

5 

3 

SB 

5-Band 

0.8855 

5 

3 


3-Band 

0.8692 

5 

3 

SZ 

5-Band 

0.3207 

5 

3 


3-Band 

0.1413 

5 

3 


A. SAND 

The 5-band model for sand is named Sand_5band_model.h5 and the 3-band 
model is Sand_3band_model.h5. Both models classified sand with a high degree of 
accuracy. Figure 17 shows that the 3-band and 5-band models both had accuracies 
above 90%. The model distinguished between areas of debris on the beach from the 
sand and correctly distinguished sand from swash zone, which is the wet sand region 
of runup in the top of the image. 

It is unclear what role the extra bands in the 5-band imagery had in this 
result. The image is a smooth area of sand, close to the water, and the 3-band model 
correctly classified the lower right-hand of the image with a higher accuracy than 
the 5-band model. The 3-band model also correctly classified the SZ with a lower 
percentage than the 5-band model. 


32 



0 



Figure 17. Ground truth sand image on left with 3-band versus 5-band 

classification 


B. BOTTOM OTHER THAN SAND (BOTS) 

The 5-band model for BOTS is named BOTS_5band_model.h5 and the 3- 
band model is BOTS_3band_model.h5. Classification of BOTS was a challenge for 
both models. Figure 18 shows that the 5-band model represented the outline of the 
BOTS area, whereas the 3-band model did not. Overall, accuracies were low, as both 
models classified the BOTS areas with 10-20% accuracy. 


33 


20 



Raw BOTS Image 


3 band classification 



5 band classification 



Figure 18. Ground truth BOTS image on left with 3-band versus 5-band 

classification 


30 

40 

50 

60 

70 

80 

90 

100 


C. SANDY BOTTOM (SB) 

The 5-band model for SB is named SB_5band_model.h5 and the 3-band 
model is SB_3band_modelout.h5. The 5-band model was able to classify SB areas 
with 60-70% confidence and the 3-band model performed at 50-60% (Figure 19). 
The 5-band model did a much better job at identifying the BOTS areas, whereas the 
3-band model did not identify any regions of BOTS. 



Figure 19. Ground truth SB image on left with 3-band versus 5-band 

classification 


34 








D. SWASH ZONE (SZ) 

The 5-band model for SZ is named SZ_5band_model.h5 and the 3-band 
model is SZ_3band_model.h5. Both models were able to classify SZ with a high 
degree of confidence. This was the most successful model in terms of confidence, 
with both models being able to classify SZ with a 90-100% confidence for most of 
the swash zone area (Figure 20). 

The 3-band model did a slightly better job of classifying SZ than the 5-band 
model. The 3-band model has a larger area of the swash zone correctly identified, 
whereas the 5-band model does not have the entire area in the highest classification 
bin. Also, the 3-band model correctly classified areas outside of the SZ in the 0-10% 
confidence bin whereas the 5-band model classified those areas as SZ with a 10-20% 
confidence. 



Figure 20. Ground truth SZ image on left with 3-band versus 5-band 

classification 


E. KELP 

The 5-band model for kelp is named kelp_5band_model.h5 and the 3-band 
model is kelp_3band_model.h5. The kelp models showed the biggest discrepancy 
between the 3-band and 5-band models. The 5-band model was able to classify kelp 
with a 70-80% confidence, while the 3-band model failed to classify kelp with any 

35 


o 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 



real accuracy (Figure 21). The 3-band model did register a 40-50% confidence, but 
the outline of this area hardly resembles the kelp outline, which is clearly delineated 
by the 5-band model. The ground truth image on the left in figure 21 has the NIR 
represented as red, Red Edge represented as green, and red represented as blue (i.e. 
the “yellow” kelp is high in both NIR and Red Edge). The confidence of the 5-band 
model decreased towards the edges of the kelp where the kelp blades begin to 
submerge in the water. 



Raw Kelp False Color Image 


3 band classification 


5 band classification 


0 

10 
20 
30 
40 
50 
60 
70 
80 
90 
100 

False color in this image denotes NIR as the red band, RedEdge as the green band, 
and red as the blue band. 

Figure 21. Ground truth kelp image on left (false color) with 3-band 

versus 5-band classification. 

F. ABOVE GROUND ROCK (AGR) 

The 5-band model for AGR is named AGR_5band_model.h5 and the 3-band 
model is AGR_3band_model.h5. AGR was another class where there was a stark 
difference between the 5-band and 3-band models. The 5-band model was able to 
classify AGR with a 50-60% confidence, whereas the 3-band model was only able to 
achieve a 20-30% confidence, but the contours representing rock are poorly 
represented (Figure 22). 


36 



0 



10 


20 


30 


40 


50 


60 


70 


80 


90 


100 


Figure 22. Ground truth AGR image on left with 3-band versus 5-band 

classification 

G. CLASS ACCURACY COMPARISON 

All classes were compared to the ground truth MLC pixel label in two separate, 
topographically different images that contained multiple classes. In the rocky image 
(Figure 15), the 5-band model had higher accuracies than the 3-band model in all six 
classes. Figure 23 shows the kelp comparison in the rocky image. The 3-band model was 
not able to classify any kelp in the image, whereas the 5-band model was able to classify 
kelp with a 72.4% accuracy when compared to the ground truth MLC output. 


Table 3. Accuracies for the rocky image, as seen in Figure 15 





SB 

86.4% 

65.9% 

SZ 

85.3% 

76.8% 

Sand 

97.8% 

91.3% 

Kelp 

72.4% 

0% 

BOTS 

91.5% 

81.1% 

AGR 

64.7% 

58. 1% 


Accuracies were obtained by comparing the MLC output of each image (5-band and 3- 
band) to the rule classifier of each class with the threshold applied. 



37 





3-band Kelp 


The top two images are the 5-band images, and the bottom two are the 3-band 
images of kelp. The images on the left on both rows are the rule classified image 
obtained from a neural network classification; the images on the right on both 
rows are the ground truth MLC output. 

Figure 23. Accuracy comparison of kelp in the 5-band (top) 

and 3-band (bottom) models 


In the sandy image (Figure 16), the 5-band model also had higher accuracies 
that the 3-band model in 4 of the 5 classes. The 3-band model performed slightly 
better in the SZ area than the 5-band model. Figure 24 shows an example of SB within 
the beach image. The 3-band SB model (bottom left) misclassified the area containing 
BOTS as SB, while the 5-band model did not classify the BOTS areas as SB. 


38 


Table 4. Accuracies for the sandy image 


Sandy Image Accuracies 

5-band 

3-band 

SB 

69.6% 

65.9% 

SZ 

74.8% 

76.8% 

Sand 

92.8% 

91.3% 

Kelp 

75.9% 

0% 

BOTS 

85.8% 

81.1% 

AGR 

N/A 

N/A 


Accuracies were obtained by comparing the MLC output of each image (5-band and 
3-band) to the rule classifier of each class with the threshold applied. The class AGR 
was not visible in this image and therefore not classified. 



3-band SB 


The top two images are the 5-band images, and the bottom two are the 3-band 
images of SB. The images on the left on both rows are the rule classified images 
obtained from a neural network classification; the images on the right on both 
rows are the ground truth MLC output. 

Figure 24. Accuracy comparison of SB in the 5-band (top) and 3-band 

(bottom) models 


39 









THIS PAGE INTENTIONALLY LEFT BLANK 


40 



V. DISCUSSION 


Overall, this study shows that a DCNN, in this case ENVINet 5, is a powerful 
tool that, in some cases, can be used to accurately classify different regions within 
the littoral area. This study focused on imagery taken on two specific days, but the 
technology shows that CNNs can be used in a variety of ways, including change 
detection of the littoral area. In addition, multispectral data in CNNs show better skill 
for class prediction than three band data (Tables 3 and 4). 5-band imagery gives the 
user a large advantage compared to 3-band imagery when attempting to classify 
images with vegetation. The 5-band models of Kelp, AGR, and SB outperformed 
their 3-band counterparts. The 3-band and 5-band models of each class were trained 
with the same labels to be able to directly compare them and to lessen the number of 
variables between the models. 

There is a wide gap in the applicability of 3-band versus 5-band imagery. 
Having the additional Red Edge and NIR spectral bands makes the classification of 
littoral areas much more accurate. In the kelp comparison, using the NIR and Red 
Edge bands of the 5-band imagery enabled the accurate labeling of kelp through the 
n-D visualizer. It is almost impossible to accurately label the kelp in the 3-band 
image, even using the n-D visualizer (Figure 9). Without the NIR and Red Edge 
bands, there is nothing that significantly jumps out as being the pixels related to kelp 
in order to complete an accurate 3-band label. This demonstrates identification of 
obstacles within littoral waters that span the water column, providing additional 
information to the warfighter. 

Ideally, the ground truth, which was represented by the 3-band and 5-band 
MLC outputs, should have been the same, since the same ground truth hand labeled 
image was sent to the MLC algorithm for classification. In order to process the 3- 
band MLC algorithm, 2 of the bands had to be removed, which were bands 4 and 5, 
the Red Edge and NIR bands. It is unclear why the MLC algorithm output different 
results. One reason may be the inh erent randomness of machine learning. Every time 

an algorithm or neural network model is run, no two outputs will be exactly the same. 

41 



However small the difference is between the 3-band and 5-band MLC outputs, these 
differences could have played a role in the accuracies that were obtained. 

The AGR class is another area that benefitted from the 5-band imagery where 
the NIR and Red Edge bands were available. In the littoral area, rocks are a natural 
habitat for flora and fauna (Claudino-Sales 2018). Flora reflects NIR strongly, and so 
does fauna but to a lesser extent (Cuesta and Lobo 2019). In this study, there were 
numerous images where rocks were almost invisible in the RGB because they were 
blending in with the water, but with NIR and Red Edge applied to the photo, the rocks 
become much more visible (Figures 25 and 26). 



Figure 25. Original image of AGR 


42 



Figure 26. AGR false color image with NIR as red, RedEdge as green 

and red as blue 


When comparing the 3-band versus 5-band models directly against each 
other, it is evident that the 5-band models outperformed their 3-band counterparts. 
The SZ 5-band and 3-band models were comparable, although the 3-band model had 
higher confidence in a larger part of the swash zone than the 5-band model and also 
correctly had lower confidence in areas that were not SZ, the areas over the water and 
sand. 

Sand was the only class where the 3-band model outperformed better than the 
5-band model, but both model performances were above 95%. In the left image of 
Figure 17, the sand in the bottom right of the image had a lower confidence in the 5- 
band classification than in the 3-band image. The sand pixels in this region are the 
brightest of all of the sand in the image. It seems that the 5-band sand model is more 
sensitive to these changes of intensity than the 3-band model. These results may 
change with additional examples of the sand class are included. These results are 


43 


consistent with previous work suggesting sand is a difficult landscape to class 
(Herrmann, 2018). 

The 5-band SB model was able to delineate SB from BOTS. Further training 
of the SB model should improve the accuracy to which the model is able to 
distinguish SB from BOTS. In retrospect, only one model is needed for bottom type 
to distinguish SB from BOTS (the label is inherently mutually exclusive). Using the 
classification raster that is generated from creating a TensorFlow mask classification, 
the areas that are classified with high accuracy as SB can be labeled SB, and the areas 
that have the lowest classification accuracy can be labeled BOTS. This would have 
saved time training separate models when these two parameters could have been 
evaluated from just two models instead of attempting to train four (2 5-band and 2 3- 
band, one for each). 

It is unclear why the BOTS models had such low accuracy. Multiple model 
runs with different combinations of the number of labels (training and validation) 
were attempted, and a working model could only be obtained with the combination 
of 10 training and 3 validation labels. The ENVI software created by Harris 
Geospatial Solutions is remarkable for the number of options available to the user to 
analyze imagery. However, the deep learning toolbox is still in its infancy, future 
versions of the toolbox will undoubtedly produce better results. 

Tables 3 and 4 showed that the 5-band model had better accuracies than the 
3-band model in almost every class except for swash zone in the beach image. The 
3-band SZ model was two percent more accurate that the 5-band SZ, 76.8% compared 
to 74.8%. This is indicative that both models categorized SZ with high accuracy, 
similar to sand. Kelp was not classified at all in the 3-band model. This was the most 
striking difference, as the 5-band model was able to classify kelp with high 
accuracies, 72.4% in the rocky image and 75.9% in the beach image. 

It is interesting to see that the 3-band model classified BOTS relatively well 
in both images (81.1% for both images compared to the 5-band accuracies of 91.5% 


44 



in the rocky image and 85.8% in the beach image) and classified AGR well in the 
rocky image (58.1% compared to the 5-band accuracy of 64.7%). 

Although the accuracies were high in the 3-band SB, it is clear from Figure 
24 that the 5-band model did a much better job of accurately depicting the area of SB. 
The 3-band model was not able to delineate the SB from the BOTS areas, and 
incorrectly labeled the BOTS areas as SB. 

For the AGR class, although the 3-band model did relatively well, the 5-band 
model was able to use the NIR and Red Edge bands to more accurately classify AGR 
that had vegetation growing on it, which reflects high in the NIR and Red Edge bands. 

Many of the images that were taken on the two flying days were not able to 
be used for training and/or validation. One reason for this was the sun glint. The level 
of sun glint varied on the images, and the images where features were still able to be 
distinguished were kept, while images where the sun glint dominated the image were 
discarded. A second reason why some of the images had to be discarded was owing 
to using raw imagery. Many of the images, even if there was no sun glint, were so 
dark that no distinguishing features were able to be observed. Linear histogram 
stretches were applied to the images in order to lessen the amount of darkness so that 
features of interest could be observed. This did produce positive results, as some 
images were sent to training folders because the linear histogram stretch was able to 
make some features recognizable, but other images where the histogram stretch had 
no effect had to be discarded. 

Several of the models, including SB, AGR, and BOTS had accuracies below 
70%. Although the models were able to extract those features from the training, they 
did so at a level below the 80% threshold that was aimed. Overall, the results of this 
study suggest that DCNNs are a powerful tool for feature extraction of images of the 
littoral area and furthermore, the ability to use 5-band imagery when training a neural 
network on images in the littoral area is essential. 


45 



THIS PAGE INTENTIONALLY LEFT BLANK 


46 



VI. CONCLUSION 


For naval operations, it is important to know the existing conditions of the 
littoral zone to conduct safe operations (ingress/egress) or map change following a 
disaster. The first objective of this research was to use a DCNN to classify the littoral 
area in respect to the classes: sand, kelp, above ground rock, swash zone, sandy 
bottom, and bottom other than sand. The second objective was to determine if a 
DCNN trained on 5-band imagery is better able to classify these classes than a DCNN 
trained on 3-band imagery. 

The DCNN used for this study was able to classify each of the 6 classes. The 
accuracy of the models varied, but the model was able to classify all of them. DCNNs 
trained on 5-band imagery outperformed their 3-band counterpart in all classes except 
for sand. UAS used for this study allowed for ease of data collection and provided 
high spatial and temporal resolution of imagery collection. A system built with 
similar features of this study would enhance the ability of naval forces to get a quick 
look at the littoral area that future operations are being planned for. 

Future studies in this research area can focus on using UAS imagery coupled 
with DCNNs to conduct change detection research of coastal littoral areas. Another 
research topic that would be of use would be to use DCNNs along with panchromatic 
high-resolution images from satellites like WorldView-3 and compare the results of 
littoral classification to those of images collected from UAS. As discussed 
previously, satellites like WorldView-3 are now able to nearly match UAS on a 
temporal timescale (~ 1 day for most areas) although they are not quite able to match 
UAS on the resolution scale (0.3 m for WorldView-3 compared to 0.1 m for UAS) 
(Collin et al. 2019). 


47 



THIS PAGE INTENTIONALLY LEFT BLANK 


48 



LIST OF REFERENCES 


Armenteros, M., and D. Saladrigas, 2018: The role of habitat selection on the 
diversity of macrobenthic communities in three gulfs of the Cuban 
Archipelago. Bulletin of Marine Science, 94, 2, 249-268, 
https://doi.org/10.5343/bms.2017.1013. 

Bostater, C. R., and T. Rotkiske, 2018: Influence of bottom depths and bottom types 
on water surface reflectance. Proceedings of SPIE, 10784, Bl-Bl 1. 
https://doi.Org/10.l 117/12.2515669. 

Buscombe, D., and A. C. Ritchie, 2018: Landscape classification with deep neural 
networks. Geosciences, 8, 244, 1-23, 
https://doi.org/10.3390/geosciences8070244. 

Claudino-Sales V., 2018: Malpelo Fauna and Flora Sanctuary, Colombia. Coastal 
World Heritage Sites, Coastal Research Library, 28. Springer, Dordrecht, 
315-320. 

Collin, A. M., M. Andel, D. James, and J. Claudet, 2019: The superspectral/ 

hyperspatial worldview-3 as the link between spacebome hyperspectral and 
airborne hyperspatial sensors: The case study of the complex tropical coast. 
Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS Arch., 42, 

1849-1854, https://doi.org/10.5194/isprs-archives-XLII-2-W 13-1849-2019. 

Cuesta, E., and J. M. Lobo, 2019: Visible and near-infrared radiation may be 
transmitted or absorbed differently by beetle elytra according to habitat 
preference. PeerJ, 7, p.e8104,l-16. 

De Juan, S., C. Lo Iacono, and M. Demestre, 2013: Benthic habitat characterisation 
of soft-bottom continental shelves: Integration of acoustic surveys, benthic 
samples and trawling disturbance intensity. Estuar. Coast. Shelf Sci., 117, 
199-209, https://doi.Org/10.1016/j.ecss.2012.ll.012. 

Flynn, K. F., and S. C. Chapra, 2014: Remote sensing of submerged aquatic 

vegetation in a shallow non-turbid river using an unmanned aerial vehicle. 
Remote Sens., 6, 12815-12836, https://doi.org/10.3390/rs61212815. 

Goldsborough, P., 2016: A tour of tensorflow. arXiv preprint arXiv:1610.01178, 16 
pp. https://arxiv.org/abs/1610.01178. 

Harris Geospatial Solutions, 2019: Classification. Accessed 08 October 2019, 
https://www.harrisgeospatial.com/docs/Classification.html. 


49 



Hermann, D., 2018: Morphodynamic Classification of Coastal Regions Using Deep 
Learning through Digital Imagery Collection. M.S. thesis, Dept, of 
Oceanography, The Naval Postgraduate School, 49 pp. 

Holland, K. T., J. A. Puleo, N. Plant, and J. M. Kaihatu, 2002: Littoral 

environmental nowcasting system (LENS). Ocean. Conf. Rec., 1, 85-91, 
https://doi.org/10.1109/oceans.2002.1193252. 

Hoonhout, B. M., M. Radermacher, F. Baart, and L. J. P. van der Maaten, 2015: An 
automated method for semantic classification of regions in coastal images. 
Coast. Eng., 105, 1-12, https://doi.Org/10.1016/j.coastaleng.2015.07.010. 

Hu, F., G. S. Xia, J. Hu, and L. Zhang, 2015: Transferring deep convolutional 
neural networks for the scene classification of high-resolution remote 
sensing imagery. Remote Sens., 1 , 14680-14707, 
https://doi.org/10.3390/rs71114680. 

Hugenholtz, C. H., K. Whitehead, O. W. Brown, T. E. Barchyn, B. J. Moorman, A. 
LeClair, K. Riddell, and T. Hamilton, 2013: Geomorphological mapping 
with a small unmanned aircraft system (sUAS): Feature detection and 
accuracy assessment of a photogrammetrically-derived digital terrain model. 
Geomorphology, 194, 16-24, 
https ://doi.org/10.1016/j .geomorph.2013.03.023. 

Kohler, P. E. C., L. Leblanc, and J. Elliott, 2016: SCOOP - NDBC’s new ocean 
observing system. Ocean. 2015 - MTS/IEEE Washingt., pp. 1-5, 
https://doi.org/10.23919/oceans.2015.7401834. 

Liu, T., 2019: Understanding Random Forest. Accessed 17 October 2019, 
https://towardsdatascience.com/understanding-random-forest- 
58381e0602d2. 

Mabus, R.E., 2015: A Cooperative Strategy For 21st Century Seapower, March 
2015. Accessed 18 October 2019, 

https://www.navy.mil/local/maritime/150227-CS21R-Final.pdf. 

Madonsela, S., M. A. Cho, A. Ramoelo, and O. Mutanga, 2017: Remote sensing of 
species diversity using Landsat 8 spectral variables. ISPRS J. Photogramm. 
Remote Sens., 133, 116-127, https://doi.Org/10.1016/j.isprsjprs.2017.10.008. 

Micasense, Inc., 2019: Best practices: Collecting Data with Micasense Sensors. 
Accessed 13 November 2019, https://support.micasense.com/hc/en- 
us/articles/224893167-Best-practices-Collecting-Data-with-MicaSense- 
RedEdge-and-Parrot- S equoia. 


50 



O’Shea, K., and R. Nash, 2015: An Introduction to Convolutional Neural Networks. 
1-11. arXiv preprint arXiv: 1511.08458, 11 pp. 
https://arxiv.org/abs/1511.08458. 

Ronneberger, O., P. Fischer, and T. Brox, 2015: U-net: Convolutional networks for 
biomedical image segmentation. Lect. Notes Comput. Sci. (including Subser. 
Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 9351, 234-241, 
https://doi.org/10.1007/978-3-319-24574-4_28. 

Salamati, N., D. Larlus, G. Csurka, and S. Susstrunk, 2012: Semantic image 

segmentation using visible and near-infrared channels. Lect. Notes Comput. 
Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 
7584 LNCS, 461-471, https://doi.org/10.1007/978-3-642-33868-7_46. 

Scooler, Jeffrey, 2017: Episodic changes in lagoon water levels due to ephemeral 
river breaching and closure events. M.S. Thesis, Dept of Oceanography, 
Naval Postgraduate School, 47pp. 

Tulldahl, H. M., P. Philipson, H. Kautsky, and S. A. Wikstrom, 2013: Sea floor 

classification with satellite data and airborne lidar bathymetry. Ocean Sens. 
Monit. V, 8724, 87240B, https://doi.org/10.1117/12.2015727. 


51 



THIS PAGE INTENTIONALLY LEFT BLANK 


52 



INITIAL DISTRIBUTION LIST 


1. Defense Tec hn ical Information Center 
Ft. Belvoir, Virginia 

2. Dudley Knox Library 
Naval Postgraduate School 
Monterey, California 


53 



